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(57) ABSTRACT 

A text-prompted speaker verification system that can be 
configured by users based on a desired level of security. A 
user is prompted for a multiple-digit (or multiple-word) 
password. The number of digits or words used for each 
password is defined by the system in accordance with a user 
set preferred level of security. The level of training required 
by the system is defined by the user in accordance with a 
preferred level of security. The set of words used to generate 
passwords can also be user configurable based upon the 
desired level of security. The level of security associated 
with the frequency of false accept errors verses false reject 
errors is user configurable for each particular application. 

40 Claims, 6 Drawing Sheets 
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USER CONFIGURABLE LEVELS OF Further, the ease of system access is user configurable and 

SECURITY FOR A SPEAKER VERIFICATION based on a desired level of security. Specifically, in accor- 

S YSTEM dance with a preferred embodiment of the present invention, 

the types of errors that may be generated by the system are 
s user configurable. This is accomplished by allowing the user 

BACKGROUND OF THE INVENTION to adjust the acceptable frequency of errors between the two 

1. Field of the Invention types of P ossible errors > false accepts and false rejects. 

The present invention relates generally to a system and 0nce a P articular ^ of securit y is defined ^ auser,the 

method for providing a speaker verification system with user in user 8™ access to the system by uttering a randomly 

selectable levels of security. 10 g enerated password as prompted by the system. The number 

u 1 aw of words or digits used for the password, the amount of user 

. elated Art training required, and the acceptable error frequency and 

The increased use of consumer electronic devices and type, are all configured by the user based on a desired level 

computer controlled remote services has heightened con- 0 f security as defined by the user for each particular appli- 

cerns over security issues. One of the primary security 15 cation, 
concern is the risk of access by unauthorized users. In order 

to safeguard against unauthorized use, passwords and/or BRIEF DESCRIPTION OF THE FIGURES 

user identification codes are generally provided. 

tu-~p™ a a a The present invention is described with reference to the 

Therefore, users of these devices and services are required accompanying drawin wherein- 

to memorize and maintain a variety of passwords and/or user 20 gs, e m. 

identification codes (user IDs) to maintain security. For FIG * 1 ™ a block dia § ram depicting a typical operational 

example, user IDs and/or passwords (hereinafter separately environment according to an embodiment of the present 

and/or collectively referred to as passwords) are generally invention. 

required when using ATM cards, credit cards, telephone FIG. 2 is a block diagram depicting an example of typical 

calling cards, bank accounts, residential security systems, 25 components comprising a speaker verification module in 

personal computer systems, remote computer services, voice accordance with an embodiment of the present invention, 

mail systems, pagers, cellular telephones and personal digi- FIG. 3 is a graph depicting types of errors associated with 

tal assistants (PDAs). a typical speaker verification system. 

It has become apparent that users are finding it difficult FIG. 4 is a flowchart that is useful for describing an 

and inconvenient to memorize and maintain these pass- example process that can be used to implement an embodi- 

words. This is especially true for users of multiple devices ment of the present invention. 

and/or services. The consequence of this inconvenience FIG. 5 is an example of user interface components that 

almost always results in some sort of breach of security. For can be used to i mp i ement the present invention, 

example, rather than commit multiple passwords to memory, , . U1 , - t _ . _ 

many users will write them down and thereby increase the 35 . J 6 * a block d "^ a * of a computer useful for 

risk of misappropriation. In another example, this inconve- ™P lementin g components of the present invention, 

nience causes users to avoid setting up optional passwords In ^ figures, like reference numbers generally indicate 

altogether. In yet another example, users tend to use trivial identical, functionally similar, and/or structurally similar 

passwords, such as their birth dates that are easily compro- elements. 

mised. In addition, many users tend to use the same pass- 40 DFSPRTPTION OF THF 

zzszzrst ^ * ont "—' * 

Therefore, to alleviate this increasing and prevailing The present invention is directed toward a system and 

problem, what is needed is a system and method for main- 45 method for providing a speaker verification system with user 

taining a high level of security that avoids the inconve- selectable levels of security. Automatic speech recognition is 

niences found in current password authorization systems. a rapidly evolving area in the field of voice processing. This 

technology is generally divided into two primary areas 

SUMMARY OF THE INVENTION referred to as speech recognition and speaker recognition. 

Accordingly, the present invention is directed toward a 50 While speech recognition is concerned with the message 

system and method for providing a security system that conveyed by the spoken utterance, speaker recognition is 

avoids inconveniences and problems found in conventional only concerned with the identity of the person speaking the 

systems. The system and method of the present invention utterance. The present invention is preferably used with a 

uses a text-prompted speaker verification system to accept system that provides speaker recognition, but can also be 

randomly generated verbal passwords from users. The sys- 55 used with speech recognition systems, 

tern and method of the present invention can be used with Speaker recognition refers to the capability to identify or 

any type of electronic device and/or any type of computer verify a user's identity based on his or her voice. Speaker 

controlled local or remote automated service. In fact, the recognition systems can be further broken down into two 

present invention can be used in any system where pass- categories, namely speaker identification and speaker veri- 

words may be utilized. 60 fication systems. In general, a speaker identification system 

The present invention prompts the user (either verbally or processes a voice sample to determine the identity of a 

textually) for a multiple-digit (or multiple- word) password person within a group of persons "known" to the system, 

used for gaining access to the system. The number of digits Groups of persons are "known" to the system through a 

or words used for each password is defined by the system in series of one or more training sessions, where each "known" 

accordance with a preferred level of user security. In 65 person's voice biometrics are captured and stored, 

addition, the level of training required by the system is user Thus, a particular person is identified by the speaker 

configurable and based on the desired level of security. identification system by comparing a current speech sample 
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with the series of stored biometrics and selecting the person In general, two types of errors are associated with speaker 

with the closest match. The output of a speaker identification verification systems, namely false accepts and false rejects, 

system is therefore, the identity of a particular speaker. A false accepts occurs when an imposter is granted access to 

A speaker verification system is less complex than a the system. A false reject occurs when a true speaker is 

speaker identification system. Speaker verification systems 5 denied access. As described below, a preferred embodiment 

typically process voice samples to determine whether it 0 f the present invention allows a user to adjust one type of 

matches a single pre-identified person. Thus, the output from error condition at the expense of the other type of error 

a speaker verification system is binary, (i.e. either a match or condition, in accordance with a preferred level of security, 

a mismatch). In a preferred embodiment of the present r , .. l . 1 1 • j ■ j 

7 1 c *• * • j For example, when high security level is desired, users 

invention, a speaker verification system is used. . n n *u * . / 1 

. j. 1 • *i_ 1 . JL 1 10 can configure the system to generate a very low occurrence 

Accordingly, in the examples presented herein, a speaker C c* * * A. * i_- L rr 1 

verification system is used to describe the present inventiou. ° f f ^*^pts at the expense of a high occurrence of false 

However, in other embodiments, differed types of speech re J ects * In t ^ exam P lc > us ^ ! would ralher U P ^ th f 

recognition systems can be used, including, among other occasional false reject, and have to repeat the password, 

types, speaker identification systems. Accordingly, the use rather than aUow the occasional false accept and risk unau- 

of a speaker verification system to describe the present 15 tnor ized access. 

invention should not be construed to limit the scope and Conversely, when a low security level is desired, users can 

breadth of the present invention. configure the system to generate a very low occurrence of 

Further, generally, two types of speaker verification sys- false rejects at the expense of an increased occurrence of 

tern exist, namely, text dependent and text independent false accepts. In this example, users would rather allow the 

systems. A text independent speaker verification system has 20 occasional false accept rather than having to deal with high 

no restriction as to the spoken utterance evaluated by the occurrences of false rejects, and thereby being forced to 

system. That is, these systems allow the user to utter any re P eat tne password to gain access. Details pertaining to this 

word or phase. The word or phase is then evaluated to unique user selectable parameter are described below, 

determine whether a match or a mismatch occurs. FIG. 1 is a block diagram depicting an operational envi- 

The problem with text independent systems is that they 25 ronment in accordance with one embodiment of the present 

require complex training. Further these systems require invention. An electronic device 2 comprises a speech input 

increased computational and storage requirements than text device 4, such as a microphone, that is used to accept speech 

dependent systems. In addition, text independent systems input 6 from a user (not shown). Examples of electronic 

are less secure when used for security purposes because any 3Q devices 2 include cellular telephones, PDAs, personal com- 

utterance of an enrolled speaker can result in a match. This puter systems, ATMs, landline telephones, dictation devices, 

enables, for example, unauthorized users to break into a or any other type of electronic device, 

system by obtaining any voice recording from an authorized It is noted that in many of the examples presented herein, 

user. As described below, the preferred system randomly a cellular telephone (or cell phone) is used as the electronic 

generates a different password on each occasion and is 35 device 2. The use a cell phone 2 to describe a preferred 

therefore not prone to this type of break-in. embodiment is for exemplary purposes only and should not 

For these reasons, the present invention is preferably used be construed to limit the scope and breadth of the present 

in conjunction with text dependant systems, as described invention. 

below. This is especially true for portable systems that In one embodiment, a speaker verification module is 

require power, space and computational resource conserva- 40 embedded in the electronic device 2 to perform security 

tion. However, in alternative embodiments, text independent functions and control system access. The speaker verifica- 

systems can also be used with the present invention. tion module 8 is used to process the speech input 6 and 

In general, text dependent systems require that the speaker verify the identity of the speaker. More specifically, in one 

utter a fixed predefined phrase. Users generally train the embodiment, the speaker verification module 8 is used to 

system by uttering one or more repetitions of the fixed 45 authenticate a particular user's speech based on predefined 

predefined phrase used by the system. In a preferred embodi- speech inputs stored in a storage device (not shown). The 

ment of the present invention, a text dependent speaker storage device (not shown) is part of the speaker verification 

verification system is used to randomly prompt the user to module 8. 

utter a phrase to gain access to the system. This type of In another embodiment, the speaker verification module 8 

system is referred to herein as a text-prompted speaker 50 is not embedded in the electronic device 2, but is remotely 

verification system. Tne system selects phrases from a coupled to the device 2 through a network 10. In this 

collection of predefined words composed from a limited example embodiment, the speaker verification module 8 is 

vocabulary set. embedded in a sever 11 that is connected to the network 10. 

In one example, the limited vocabulary set comprises the The network 10 in this example represents any type of 
digits zero through nine. In another embodiment, different 55 computer and/or communications network and/or any corn- 
words are used such as colors, names, and the like. In bination thereof. For example, in one embodiment of the 
general any set of words can be used to comprise the limited present invention, the device 2 is a cellular telephone and the 
vocabulary set. network 10 is a cellular network coupled with a computer 

The text-prompted speaker verification method is more network. The computer network can be a private network 

complex, but provides higher security than the fixed phrase 60 sucn as a l oca * area network, or a public network such as the 

method. For example, using the text -prompted method, user Internet. 

passwords cannot be. misappropriated by tape recording a In another embodiment of the present invention, the 

speaker and then playing it back in response to the password electronic device 2 is any type of telephone. In this 

prompt. In addition, the text-prompted method is preferred embodiment, the telephone 2 is used to access a remote 

because users are not required to memorize passwords. This 65 service on the server 11, such as a bank account or the like, 

adds a much needed convenience that is not found in The choice of whether to embed the speaker verification 

conventional security systems. module 8 in the local or remote device (2 or 11), depends on 
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several factors that should be considered when implement- Accordingly, during the enrollment phase, a classifier 

ing particular embodiments of the present invention. For module 25 is used to create a speaker model. The input 

example, in one embodiment, where the device 2 is a cell speech streams are used to extract the user's voice biomet- 

phone, the speaker verification module 8 can reside in either rics and create a speaker model 34 therefrom. Many different 

the local or remote device. 5 methods can be used to the create speaker model 34. For 

In this example, an advantage of embedding the speaker example, the classifier may contain a neural network that is 

verification module 8 in the remote device 11 is the virtually used for tne purpose of creating the speaker model 34. Other 

unlimited availability of computing power and storage weU known techniques that can be used include the Hidden 

space. A disadvantage of locating the speaker verification Markov Model (HMM) and the dynamic time warping 

module 8 in the sever 11, is that the speech signal 6 must 10 (DTW). 

travel through the network 10 before being processed. Thus, Where, the speaker model 34, represents a particular 

using the remote embodiment 11, the speech signal 6 is speaker's voice, the cohort model 32, represents the voice of 

highly susceptible to noise and signal degradation, which all other people. In particular, the cohort model 32 is used to 

can adversely effect speaker verification techniques. distinguish a particular speaker's voice from all others. 

Accordingly, because increased noise and signal degra- 15 Utterances are compared against both the cohort model 32 

dation considerably complicates the task of speaker and the s P eaker model 34 ^ * w <> separate comparisons, 

verification, the speaker verification module 8 is preferably Ideally, an authorized user will score high against the 

embedded in the device 2. This embodiment is referred to speaker model 34 and low against the cohort model 32. The 

herein as the "local embodiment." Due to practical threshold database 30 is used to store values associated with 

limitations, however, the local embodiment can also be 20 these s 00 ™ t0 determine whether a match or mismatch 

problematic. For example, portable devices, such as cell occurs. 

phones and the like, have limited space and power resources. Accordingly, two models 34 and 32 are used rather than 

Thus, in order to implement the local embodiment, the a single speaker model 34 to increase the reliability of the 

speaker verification module 8 must be sufficiently efficient speaker verification system 8. The interaction between the 

so that it can be implemented using the limited computing, 25 threshold database 30, the cohort model 32, and the speaker 

power and storage resources available in a portable device. model 34 is best illustrated with an example. In this 

An example of a method that can be used in conjunction example, it is assumed that the system 8 assigns a score from 

with the present invention is disclosed in the co-pending °" 100 » for each s P eech France comparison made against 

U.S. patent application Ser. No. 09/408,453, which is incor- a particular model. That is, for a perfect score, one would 

porated herein by reference. An example of this technique 30 ex P ect to score 100 against the speaker model 34, and zero 

disclosed by the above referenced patent application is a S amst the modeL Io realit y> such scores are rarel y 

briefly described below, achieved, as the example below illustrates. 

FIG. 2 is a block diagram depicting an example of typical Durin S a l yP ical tramin g P hase > the s P eaker verification 

components comprising a speaker verification module 8 in « s y stem 8 P rom Pte the user to utter a certain word or phrase 

accordance with one embodiment of the present invention. a multl P le number of times. For the purposes of this 

An analog to digital (A/D) converter module 20 is used to example, it is assumed that the following scores are a result 

convert a speech signal 6 into a digital speech signal, using of tms tv P e of tomwg session, 
standard well-known sampling techniques. 

A preprocessing and end-pointing module 22 is used to 40 
process the digitized speech signal to filter the signal and 
remove unnecessary items such as periods of silence. For 
example, periods of silence at the beginning, the end and 
between words are typically discarded. Further, the prepro- 
cessing module 22 typically filters the signal to eliminate, 45 
for example, speech artifacts caused by the digitizing pro- 
cess in the A/D converter 20. Consequently, the output from 
the preprocessing module 22 is a more compact and cleaner 
digitized speech signal. I n m i s example, the average score for the utterances when 

Next, the output from the pre-processing module 22 is 50 compared against the speaker model is 64. The average 

used as input to a feature extraction module 24. The feature score for the same utterances when compared against the 

extraction module 24 takes the filtered digitized speech cohort model is 44. These values can be used to set threshold 

signals and converts it to feature vectors. In this example, values for match determination. Thus for example, one 

feature vectors are the result of a process that extracts could set a threshold value of 44 for the cohort model, and 

relevant portions of the digitized speech sample. The con- 55 a threshold value of 64 for the speaker model. Using this 

tents of the feature vectors include spectral information. simplistic approach, a match is established if the score from 

Thus, in a typical application, multiple speech samples are a future utterance of the same word or phrase, is 44 or below 

compressed into a much smaller number of samples com- against the cohort model, and 64 or above against the 

prising spectral information. speaker model. 

The next path taken in the process depends on whether the 60 In practice however, this simplistic scheme is not very 
process is executed during an enrollment phase or during efficient. For example, different conditions, such as back- 
speaker verification phase. The enrollment phase is used to ground noises, etc., and normal variations in a people's 
train the system for particular users. The speaker verification voices, can result in dramatically different scores on differ- 
phase is used to authenticate users during the operation of ent occasions. However, taking these changing conditions 
the security system. In this example, the enrollment phase is 65 into account, it has been determined that the difference in 
represented by the upper portion of FIG. 2, and the speaker scores between the cohort model 34 and the speaker model 
verification phase is represented by the lower portion. 34 remain relatively constant. 
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Therefore, this differential value (i.e. the difference in 
scores between the cohort and speaker models), rather than 
the raw scores, is used to determine matches. For example, 
suppose that an average differential between the scores 
against the speaker and cohort models is 20 percent. In this 
case, a match will be found for future utterances if the 
speaker score is at least 20 percent greater than the cohort 
score. 

Referring back now to FIG. 2, the speaker verification 
phase of a speaker verification module 8, will now be 
described. During the speaker verification phase, the feature 
vectors, constructed by the feature extraction module 24 are 
input into the pattern matching module 26. 

As shown, the pattern matching module 26 is coupled 
with both the cohort model 32 and speaker model 34. The 
pattern matching module 26 is where a particular score is 
generated for each of the speaker and cohort models 34 and 
32 respectively. 

Next, as indicated, the threshold comparison module 28 is 
used to determine, based on the threshold database 30, and 
the concepts described above, (and in the above referenced 
co-pending patent application), whether to accept or reject 
the speaker. As indicated, the output from the threshold 
comparison module 28 is either an accept or reject decision. 

Note that this is just one example of a means for imple- 
menting a speaker verification system 8. Other methods can 
be used. It is noted however, that the present invention can 
be used with any type of known or future speaker verifica- 
tion system. In fact, the present invention can also be used 
with other forms of speaker or speech recognition systems. 
After reading the present disclosure, the adaptability of the 
present invention to other forms of speech recognition 
systems would be apparent to persons skilled in the relevant 
art(s). Accordingly, the examples used herein should not be 
construed to limit the scope and breadth of the present 
invention. 

FIG. 3 is a graph depicting types of errors associated with 
a typical speaker verification system. Because this is a 
binary system, i.e. an accept or reject decision, the types of 
errors are false accepts 42, as shown on the vertical axis, and 
false rejects 48 as shown on the horizontal axis. A false 
accept occurs when an imposter is recognized by the system 
8 as an authorized user. A false reject occurs when an 
authorized speaker is not recognized by the system 8 and is 
therefore not allowed to gain access. 

By adjusting the threshold 30, as described above, a 
particular security level can be provided by allowing one 
type of error to be prevalent over the other type of error. For 
example, as shown by reference point 50, by setting a high 
false accept threshold, the system 8 would generate very low 
occurrences (2%) of false rejects and very high occurrences 
(20%) of false accepts. A threshold setting at this level 
makes it highly likely that an imposter can gain access to the 
system. At the same time, however, a false reject by the 
system is very infrequent. 

Accordingly, a high level of security is realized when the 
speaker verification module 8 is programmed such that very 
strict values are used to determine whether a match occurs. 
That is, scores that are very close to the scores achieved 
during training are used to determine matches. The down- 
side to using this level of security, of course, is that it opens 
up the possibility of having a high occurrence of false 
rejects. 

Conversely, if the scores standard is more relaxed, and are 
allowed to deviate from the scores achieved during training, 
a low level of security is realized. The downside to this 
approach is that a higher number of false accepts are 
possible. 
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But in some cases, users would be more willing to risk 
access by unauthorized users in some applications, rather 
than dealing with a high number of false rejections, in which 
case, the user has to reattempt system access. This would be 
5 true for low level security applications. 

Towards the other end of the spectrum, as shown by 
reference point 54, a threshold setting at this level, yields 
high occurrences (25%) of false rejects and low occurrences 
(2%) of false accepts. This represents a very high level of 
10 security, where a user would rather put up with false rejects, 
rather than risking even low occurrences of false entries into 
the system. 

A significant advantage of the present invention is that the 
user can select a desired level of security for each applica- 
15 tion. For example, for one application, such as a bank 
account or the like, a user may select a very high level of 
security. In this case, the user will likely select a level of 
security near point 54 on the high security end of the 
spectrum. 

20 Similarly, for a less secure application such as home 
computer system or the like, a user may select a low level 
of security. In this case, the user will select a level of security 
near point 50 on the low end of the security spectrum. For 
another application, such as a cell phone, a user may select 

25 a medium level of security at some point in-between the high 
and low ends, such as point 52, for example. 

FIG. 4 is a flowchart that is useful for describing an 
example process that can be used to implement the present 

3Q invention. The process begins with step 62. In step 62, the 
user is asked to select a particular level of security. In this 
example, the user is presented with a choice of either a high, 
medium or low security level. In another embodiment, the 
user is given more flexibility. For example, in one 
embodiment, the user is asked for a level of security from 1 
to 100. 

Next, as indicated by steps 64-68, three user configurable 
parameters are assigned values based on the level of security 
selected in step 62. The first parameter is the number of 

^ repetitions used to train the system, during the training 
phase. Accordingly, as the level of security increases, so 
does the number of repetitions required to train the system. 

For example, as shown in step 64, if a high level of 
security is selected, the number of repetitions is set to 5. As 

45 shown in step 66, if a medium level of security is selected, 
the number of repetitions is set to 3. As shown in step 68, if 
a low level of security is selected, the number of repetitions 
is set to 1. 

The second parameter that is set is steps 64-68 is the 
50 number of words (in this case, digits) used to create a 
password. Generally, as the level of security increases, so 
does the number of words used for the password. For 
example, as shown in step 64, if a high level of security is 
selected, the number of words (digits) is set to 5. As shown 
55 in step 66, if a medium level of security is selected, the 
number of words (digits) is set to 3. As shown in step 68, if 
a low level of security is selected, the number of digits is set 
to 1. 

The third parameter that is set in steps 64-68 is the 
60 adjustment made to the speaker verification module 8 based 
on the desired error type, as described above. In particular, 
using the example described above, the threshold values are 
adjusted to accommodate the particular level of security 
desired. 

65 Thus, as indicated by step 64, if a high level of security 
is selected in step 62, the threshold value is chosen such that 
false rejects occur more often than false accepts. As indi- 
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cated by step 66, if a medium level of security is selected, implemented in a computer system or other processing 

the threshold values are adjusted such that the frequency of system. In fact, in one embodiment, the invention is directed 

false rejects are approximately equal to the frequency of toward a computer system capable of carrying out the 

false accepts. As indicated by step 68, if a low level of functionality described herein. An example computer system 

security is selected, the threshold values are adjusted such 5 ioi is shown in FIG. 6. The computer system 101 includes 

that the false accepts occur more frequently than the false one or more pr0C essors, such as processor 104. The proces- 

rejects. ^ ^ connec ted to a communication bus 102. Various 

Next, as indicated by step 70, the process determines software embodiments are described in terms of this 

whether a speaker model exists, in the speaker model storage example system , MiQT reading mis description, it 

34 that is equal to or greater than the level of security 1Q ^ become t tQ a Q ^ ^ felevant ^ 

selected. For example, if all of the digits have already been how to implement the invention using other computer sys- 
learned from a previous training system, m which the A ,. A A & r 3 

. f * 1 r 1 * .1. u c tems and/or computer architectures, 

number of repetitions are at least equal to the number of r 

repetitions set above, then there is no need to train the Computer system 102 also includes a main memory 106, 

system again. prefer ably random access memory (RAM), and can also 

But if such a speaker model does not exist, then it is 15 include a secondary memory 108. The secondary memory 

necessary to create a speaker model as shown in step 72. I** 8 can include, for example, a hard disk drive 110 and/or 

This is generally accomplished by prompting the user to a removable storage drive 112, representing a floppy disk 

repeat each digit a number of times equal to the number of drive, a magnetic tape drive, an optical disk drive, etc. The 

repetition parameters set above. Next, as indicated by step removable storage drive 112 reads from and/or writes to a 

74, the process ends. 20 removable storage unit 114 in a well-known manner. 

It is noted that in the above example, it is assumed that the Removable storage unit 114, represents a floppy disk, mag- 
passwords generated by the system comprise one or more netic taoe > optical disk, etc. which is read by and written to 
digits, depending on the security level, where the digits are b y removable storage drive 112. As will be appreciated, the 
the words "zero", "one", "two", "three", "four", "five", removable storage unit 114 includes a computer usable 
"six", "seven", "eight", and "nine". However, in other 25 storage medium having stored therein computer software 
embodiments, the passwords may comprise any word or and/or data. 

phrase. In alternative embodiments, secondary memory 108 may 
In another embodiment, the set of words that can be used include other similar means for allowing computer programs 
to create passwords is also adjusted, based on the desired or other instructions to be loaded into computer system 101. 
level of security, For example, for a low security application, 30 Such means can include, for example, a removable storage 
only the digits "one", "two" and "three" are used. This of unit 122 an d an interface 120. Examples of such can include 
course, also reduces the amount of training time necessary. a program cart ridge an d cartridge interface (such as that 
For higher levels of security, more digits, or other words, found in video game devices), a removable memory chip 
phases, etc. are added to the set of possible words used to (such as an EPROM, or PROM) and associated socket, and 
create passwords. Accordingly, the set of words used to 35 other removable storage unit s 122 and interfaces 120 which 
create passwords is yet another parameter that would be set allow software and data to be transferred from the remov- 
in steps 64-68, based on the level of security selected in step able storage unit 122 to computer system 101. 
^2. Computer system 101 can also include a communications 
FIG. 5 is an example of user interface components that 4Q interface 124. Communications interface 124 allows soft- 
can be used to implement the present invention, when used ware and data to be transferred between computer system 
with a device 2, that includes some kind of display screen, 101 an d external devices. Examples of communications 
such as a personal computer. In addition, even for embodi- interface 124 can include a modem, a network interface 
ments that lack a display screen, FIG. 5 is useful for (such as an Ethernet card), a communications port, a PCM- 
describing the type of user input data that can be used with 45 CIA slot and card, etc. Software and data transferred via 
any embodiment of the present invention. Other types of communications interface 124 are in the form of signals 
user interfaces that can be used with other types of devices which can be electronic, electromagnetic, optical or other 
would be apparent to persons skilled in the relevant art(s). signals capable of being received by communications inter- 
User interface 80 depicts a dialog box in which a user can face 124. These signals 126 are provided to communications 
select a desired number of repetitions required to train the 50 interface via a channel 128. This channel 128 carries signals 
system. The greater the number of repetitions, the greater the 126 and can be implemented using wire or cable, fiber 
security level. optics, a phone line, a cellular phone link, an RF link and 

User interface 82 depicts a dialog box that allows the user otner communications channels, 
to select a desired false accept frequency verses false reject In this document, the terms "computer program medium" 
frequency. In this example, the user drags a bar under the 55 and "computer usable medium" are used to generally refer 
graph to the desired location to set a customized level of to media such as removable storage device 112, a hard disk 
security. In this example, the graph in user interface 82 is installed in hard disk drive 110, and signals 126. These 
similar to the graph shown in FIG. 3. computer program products are means for providing soft- 
User interface 84 depicts a dialog box in which a user can ware to computer system 101. 
select a desired number of digits for generated passwords. 60 Computer programs (also called computer control logic) 
The greater the number of repetitions, the greater the secu- are stored in main memory and/or secondary memory 108. 
rity level. A similar interface can be used to select the Computer programs can also be received via communica- 
number of words in the set of words used to generate tions interface 124. Such computer programs, when 
passwords. In another embodiment, the user can select a list executed, enable the computer system 101 to perform the 
of words that can be used to generate passwords. 65 features of the present invention as discussed herein. In 
The present invention may be implemented using particular, the computer programs, when executed, enable 
hardware, software or a combination thereof and may be the processor 104 to perform the features of the present 
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invention. Accordingly, such computer programs represent 
controllers of the computer system 101. 

In an embodiment where the invention is implemented 
using software, the software may be stored in a computer 
program product and loaded into computer system 101 using 5 
removable storage drive 112, hard drive 110 or communi- 
cations interface 124. The control logic (software), when 
executed by the processor 104, causes the processor 104 to 
perform the functions of the invention as described herein. 

In another embodiment, the-invention is implemented 10 
primarily in hardware using, for example, hardware com- 
ponents such as application specific integrated circuits 
(ASICs). Implementation of the hardware state machine so 
as to perform the functions described herein will be apparent 
to persons skilled in the relevant art(s). 15 

In yet another embodiment, the invention is implemented 
using a combination of both hardware and software. 

While various embodiments of the present invention have 
been described above, it should be understood that they have 2Q 
been presented by way of example only, and not limitation. 
Thus, the breadth and scope of the present invention should 
not be limited by any of the above-described exemplary 
embodiments, but should be defined only in accordance with 
the following claims and their equivalents. 25 

What is claimed is: 

1. A method for configuring a user configurable level of 
security for use in a speech recognition security system, the 
method comprising the steps of: 

accepting an input from the user, the input identifying the 30 
user and indicating a desired level of security for 
configuring the speech recognition security system; 

dynamically adjusting at least one of a set of parameters 
governing the speech recognition security system in 
accordance with the desired level of security for the 35 
particular application; 

determining if the speech recognition security system 
includes a first speaker model created for the identified 
user, the first speaker model having a security level 
equal to or greater than the desired level of security; 40 
and 

creating a second speaker model for the identified user by 
training the speech recognition security system if the 
determining step determines that the speech recogni- 
tion security system does not include the first speaker 45 
model. 

2. The method of claim 1, wherein the step of dynamically 
adjusting at least one of a set of parameters governing the 
speech recognition security system includes the step of 
updating a parameter that defines the number of repetitions 50 
used to train the speech recognition system in accordance 
with the desired level of security. 

3. The method of claim 2, wherein the defined number of 
repetitions increase as the desired level of security increases. 

4. The method of claim 2, wherein the defined number of 55 
repetitions decrease as the desired level of security 
decreases. 

5. The method of claim 1, wherein the step of dynamically 
adjusting at least one of a set of parameters governing the 
speech recognition security system includes the step of 60 
updating a parameter that defines the number of words used 

to create passwords in accordance with the desired level of 
security. 

6. The method of claim 5, wherein the defined number of 
words increase as the desired level of security increases. 65 

7. The method of claim 5, wherein the defined number of 
words decrease as the desired level of security decreases. 
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8. The method of claim 1, wherein the step of dynamically 
adjusting at least one of a set of parameters governing the 
speech recognition security system includes the step of 
updating a parameter that defines the frequency of false 
accept errors versus the frequency of false reject errors in 
accordance with the desired level of security. 

9. The method of claim 8, wherein the parameter is 
updated such that the frequency of false rejects increase and 
the frequency of false accepts decrease as the desired level 
of security increases. 

10. The method of claim 8, wherein the parameter is 
updated such that the frequency of false rejects decrease and 
the frequency of false accepts increase as the desired level 
security decreases. 

11. The method of claim 1, wherein the step of dynami- 
cally adjusting at least one of a set of parameters governing 
the speech recognition security system includes the step of 
updating a parameter that defines the set of words used to 
create passwords in accordance with the desired level of 
security. 

12. The method of claim 11, wherein the defined set of 
words used to create passwords increase in number as the 
desired level of security increases. 

13. The method of claim 11, wherein the defined set of 
words used to create passwords decrease in number as the 
desired level of security decreases. 

14. The method of claim 1, wherein the speech recogni- 
tion system is a speaker verification system. 

15. The method of claim 14 wherein the speaker verifi- 
cation system is a text dependant speaker verification sys- 
tem. 

16. The method of claim 14 wherein the speaker verifi- 
cation system is a text independent speaker verification 
system. 

17. The method of claim 1 further comprising the step of 
comparing a speech of said user against said first speaker 
model and a cohort model to identify said user. 

18. The method of claim 1 further comprising the step of 
comparing a speech of said user against said second speaker 
model and a cohort model to identify said user. 

19. The method of claim 1 further comprising: creating 
the second speaker model for the identified user based on the 
first speaker model if the determining step determines that 
the speech recognition security system includes the first 
speaker model. 

20. A user configurable speech recognition security sys- 
tem comprising: 

an input device for accepting an input from the user, the 
input identifying the user and indicating a desired level 
of security for configuring the speech recognition secu- 
rity system; 

an adjusting means for dynamically adjusting at least one 
of a set of parameters governing the speech recognition 
security system in accordance with the desired level of 
security; 

a determining means for determining if the speech rec- 
ognition security system includes a first speaker model 
created for the identified user, the first speaker model 
having a security level equal to or greater than the 
desired level of security; and 

a creating means for creating a second speaker model for 
the identified user by training the speech recognition 
security system if the determining means determines 
that the speech recognition security system does not 
include the first speaker model. 

21. The system of claim 20, wherein the adjusting means 
includes means for updating a parameter that defines the 
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number of repetitions used to train the speech recognition system; an adjusting means for dynamically adjusting 

system in accordance with the desired level of security. at least one of a set of parameters governing the speech 

22. The system of claim 20, wherein the adjusting means recognition security system in accordance with the 
includes means for updating a parameter that defines the desired level of security 

number of words used to create passwords in accordance 5 , determini means for d ' etermilliDg jf ^ speech rec . 

with the desired level of security. ... & 4 . . , % . , . . 

23. The system of claim 20, wherein the adjusting means °* mi ™ t % , 

includes means for updating a parameter that defines the ' reated for the ! der | tlfi ^ User ; the first S P eake ' mod ? 

frequencyof false accept errors versus the frequency of false ^ avin g a secu ^ level ec * ual to or S reater than the 

reject errors in accordance with the desired level of security. 10 desired level of security; and 

24. The system of claim 20, wherein the adjusting means a creating means for creating a second speaker model for 
includes means for updating a parameter that defines a set of the identified user by training the speech recognition 
words used to create passwords in accordance with the security system if the determining means determines 
desired level of security, wherein the number of words in the that the speech recognition security system does not 
set is proportional to the desired level of security. is include the first speaker model. 

25. The system of claim 20, wherein the electronic device 34. The computer program product of claim 33, wherein 
is a telephone. the adjustment means includes means for updating a param- 

26. The system of claim 25, wherein the speech recogni- e ter that defines the number of repetitions used to train the 
tion system is installed on a remote server coupled with a speecn recognition system in accordance with the desired 
telephone network. 20 level of security. 

27. The system of claim 20, wherein the electronic device 35. comp uter program product of claim 33, wherein 
is a cellular handset. t he adjustment means includes means for updating a param- 

28. The system of claim 20, wherein the electronic device eter that defiaes a number 0 f words used to create passwords 
is a personal digital assistant. m accordance with the desired level of security. 

29. The system of claim 20, wherein the electronic device 25 36. The computer program product of claim 33, wherein 
is a personal computer system. the adjustment means includes means for updating a param- 

30. The system of claim 20 further comprising a com- e ter that defines the frequency of false accept errors versus 
parison means for comparing a speech of said user against the frequency of false reject errors in accordance with the 
said first speaker model and a cohort model to identify said desired level of security. 

us t r : _ _ f . 30 37. The computer program product of claim 33, wherein 

31. The system of claim 20 further comprising a com- the adjustment means includes means for updating a param- 
panson means for comparing a speech of said user against eter that defines the set of words used to create passwords in 
said second speaker model and a cohort model to identify accordance with the desired level of security. 
sai l l u ^* _ , . ^ 38. The computer program product of claim 33 further 

32. The system of claim 20, wherein the creating means 35 comprising a comparison means for comparing a speech of 
creates the second speaker model for the identified user said user against said first speaker model and a cohort model 
based on the first speaker model if the determining step to identify said user. 

determines that the speech recognition security system 39. Th e computer program product of claim 33 further 

includes the first speaker model. comprising a comparison means for comparing a speech of 

33. A computer program product comprising a computer 40 said user against said second speakef model md a cohort 
useable medium having computer program logic stored mode i t0 identify said user. 

therein, said computer program logic for enabling a com- 40. The computer program product of claim 33, wherein 

puter to configure a user configurable level of security for the creating means creates the second speaker model for the 

use in a speech recognition security system, said computer identified user based on the first speaker model if the 

program logic comprising: 45 determining step determines that the speech recognition 

an input device for accepting input from the user, the input security system includes the first speaker model, 
identifying the user and indicating a desired level of 

security for configuring the speech recognition security ***** 
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